18 research outputs found
HM-LDM: A Hybrid-Membership Latent Distance Model
A central aim of modeling complex networks is to accurately embed networks in
order to detect structures and predict link and node properties. The latent
space models (LSM) have become prominent frameworks for embedding networks and
include the latent distance (LDM) and eigenmodel (LEM) as the most widely used
LSM specifications. For latent community detection, the embedding space in LDMs
has been endowed with a clustering model whereas LEMs have been constrained to
part-based non-negative matrix factorization (NMF) inspired representations
promoting community discovery. We presently reconcile LSMs with latent
community detection by constraining the LDM representation to the D-simplex
forming the hybrid-membership latent distance model (HM-LDM). We show that for
sufficiently large simplex volumes this can be achieved without loss of
expressive power whereas by extending the model to squared Euclidean distances,
we recover the LEM formulation with constraints promoting part-based
representations akin to NMF. Importantly, by systematically reducing the volume
of the simplex, the model becomes unique and ultimately leads to hard
assignments of nodes to simplex corners. We demonstrate experimentally how the
proposed HM-LDM admits accurate node representations in regimes ensuring
identifiability and valid community extraction. Importantly, HM-LDM naturally
reconciles soft and hard community detection with network embeddings exploring
a simple continuous optimization procedure on a volume constrained simplex that
admits the systematic investigation of trade-offs between hard and mixed
membership community detection.Comment: Camera-ready version. Accepted for oral presentation at the 11th
International Conference on Complex Networks and their Applications, CNA 2
: Random Walk Diffusion meets Hashing for Scalable Graph Embeddings
Learning node representations is a crucial task with a plethora of
interdisciplinary applications. Nevertheless, as the size of the networks
increases, most widely used models face computational challenges to scale to
large networks. While there is a recent effort towards designing algorithms
that solely deal with scalability issues, most of them behave poorly in terms
of accuracy on downstream tasks. In this paper, we aim at studying models that
balance the trade-off between efficiency and accuracy. In particular, we
propose , a scalable embedding model that
computes binary node representations.
exploits random walk diffusion probabilities via stable random projection
hashing, towards efficiently computing embeddings in the Hamming space. Our
extensive experimental evaluation on various graphs has demonstrated that the
proposed model achieves a good balance between accuracy and efficiency compared
to well-known baseline models on two downstream tasks
Topic-aware latent models for representation learning on networks
International audienceNetwork representation learning (NRL) methods have received significant attention over the last years thanks to their success in several graph analysis problems, including node classification, link prediction and clustering. Such methods aim to map each vertex of the network into a low dimensional space in a way that the structural information of the network is preserved. Of particular interest are methods based on random walks; such methods transform the network into a collection of node sequences, aiming to learn node representations by predicting the context of each node within the sequence. In this paper, we introduce TNE, a generic framework to enhance the embeddings of nodes acquired by means of random walk-based approaches with topic-based information. Similar to the concept of topical word embeddings in Natural Language Processing, the proposed model first assigns each node to a latent community with the favor of various statistical graph models and community detection methods, and then learns the enhanced topic-aware representations. We evaluate our methodology in two downstream tasks: node classification and link prediction. The experimental results demonstrate that by incorporating node and community embeddings, we are able to outperform widely-known baseline NRL models
Kernel Node Embeddings
International audienceLearning representations of nodes in a low dimensional space is a crucial task with many interesting applications in network analysis, including link prediction and node classification. Two popular approaches for this problem include matrix factorization and random walk-based models. In this paper, we aim to bring together the best of both worlds, towards learning latent node representations. In particular, we propose a weighted matrix factorization model which encodes random walk-based information about the nodes of the graph. The main benefit of this formulation is that it allows to utilize kernel functions on the computation of the embeddings. We perform an empirical evaluation on real-world networks, showing that the proposed model outperforms baseline node embedding algorithms in two downstream machine learning tasks
Learning Node Embeddings with Exponential Family Distributions
International audienceRepresenting networks in a low dimensional latent space is a crucial task with many interesting application in graph learning problems, such as link prediction and node classification. A widely applied network representation learning paradigm is based on the combination of random walks with the traditional Skip-Gram approach, modeling center-context node relationships. In this paper, we emphasize on exponential family distributions to capture rich interaction patterns between nodes in random walk sequences. We introduce the generic exponential family graph embedding (EFGE) model, that generalizes random walk-based network representation learning techniques to exponential family conditional distributions. Our experimental evaluation demonstrates that the proposed technique outperforms well-known baseline methods in two downstream machine learning tasks
Topic-aware latent models for representation learning on networks
International audienceNetwork representation learning (NRL) methods have received significant attention over the last years thanks to their success in several graph analysis problems, including node classification, link prediction and clustering. Such methods aim to map each vertex of the network into a low dimensional space in a way that the structural information of the network is preserved. Of particular interest are methods based on random walks; such methods transform the network into a collection of node sequences, aiming to learn node representations by predicting the context of each node within the sequence. In this paper, we introduce TNE, a generic framework to enhance the embeddings of nodes acquired by means of random walk-based approaches with topic-based information. Similar to the concept of topical word embeddings in Natural Language Processing, the proposed model first assigns each node to a latent community with the favor of various statistical graph models and community detection methods, and then learns the enhanced topic-aware representations. We evaluate our methodology in two downstream tasks: node classification and link prediction. The experimental results demonstrate that by incorporating node and community embeddings, we are able to outperform widely-known baseline NRL models
TNE: A Latent Model for Representation Learning on Networks
International audienceNetwork representation learning (NRL) methods aim to map each vertex into a low dimensional space by preserving both local and global structure of a given network. In recent years, various approaches based on random walks have been proposed to learn node embeddings-thanks to their success in several challenging problems. In this paper, we introduce a general framework to enhance node embeddings acquired by means of the random walk-based approaches. Similar to the notion of topical word embeddings in NLP, the proposed framework assigns each vertex to a topic with the favor of various statistical models and community detection methods, and then generates the enhanced community representations. We evaluate our method on two downstream tasks: node classification and link prediction. The experimental results demonstrate that the incorporation of vertex and topic embeddings outperform widely-known baseline NRL methods
Exponential Family Graph Embeddings
International audienceRepresenting networks in a low dimensional latent space is a crucial task with many interesting applications in graph learning problems, such as link prediction and node classification. A widely applied network representation learning paradigm is based on the combination of random walks for sampling context nodes and the traditional Skip-Gram model to capture center-context node relationships. In this paper, we emphasize on exponential family distributions to capture rich interaction patterns between nodes in random walk sequences. We introduce the generic exponential family graph embedding model, that generalizes random walk-based network representation learning techniques to exponential family conditional distributions. We study three particular instances of this model, analyzing their properties and showing their relationship to existing unsupervised learning models. Our experimental evaluation on real-world datasets demonstrates that the proposed techniques outperform well-known baseline methods in two downstream machine learning tasks